ê LONG E e û LONG u U åˆ: OVERLONG AA åˆ LONG AA å î LONG i â: OVERLONG â LONG a o ă SHORT A

ISO/IEC JTC1/SC2/WG2 N3377 L2/08-024 2008-01-25 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group Document Title: Proposal to add the Samaritan alphabet to the BMP of the UCS Source: UC Berkeley Script Encoding Initiative (Universal Scripts Project) Authors: Michael Everson & Mark Shoulson Status: Individual Contribution Action: For consideration by JTC1/SC2/WG2 and UTC Date: 2008-01-25 1. Historical background. Both the Hebrew and the Samaritan scripts ultimately derive from the Phoenician, but by different routes. According to Naveh 1997, by 1050 BCE, the Phoenician script had lost all of the pictographic features which were present in Proto-Canaanite. Phoenician script was adopted by speakers of Aramaic and Hebrew. Hebrew writing began to take on unique features (i.e. those of Palaeo-Hebrew) by the mid-ninth century BCE, and Aramaic writing began to take on its own features by the middle of the eighth century BCE. The destruction of the First Temple and the exile of educated Hebrew speakers to Babylonia changed things greatly, according to Naveh (p. 78). Later generations returned to Judah, by then a Persian province, where Aramaic was official; many of these people were bilingual in Aramaic and Hebrew, and had given up the Palaeo-Hebrew script which they had used prior to the exile, writing instead in a script derived from Aramaic having abandoned their original script (pp. 112 ff.). They later developed this script until by the second century CE it had developed into the Jewish script which became the Square Hebrew used today. The abandonment of one script for another (even if the two scripts are related) is complex, particularly with regard to conservative cultures such as that of the Jews. Naveh suggests that, although Aramaic script was very widespread during the Persian period indeed being the international script par excellence it was not until the official language of the Persian government had become Greek that the by-then-familar Aramaic came to be modified into the uniquely Jewish script which we know today as the Hebrew encoded in the UCS. Apparently some differentiation in function arose between the use of the Aramaic-derived writing (= Square Hebrew) vs. Hebrew-derived writing (Phoenician or Palaeo- Hebrew), with the Pharisees apparently disapproving the Hebrew-derived script. Naveh quotes from the Babylonian Talmud, Sanhedrin 21b: Originally the Torah was given to Israel in the Hebrew script and in the sacred language; later, in the time of Ezra, the Torah was given in the Assyrian script [i.e. the Aramaic script, introduced by the Assyrians as an official script] and the Aramaic language. They selected for Israel the Assyrian script and the Hebrew language, leaving the Hebrew script and the Aramaic language for the ordinary people. The Samaritans, who had not gone into exile, did not give up their Palaeo-Hebrew tradition, and continue to use a variety of this script to the present day. According to Naveh, they believe that they are the true descendants of the sons of Israel ; Rav Hisda explained (in the third century CE) that they are the 1

ordinary people referred to in the Babylonian Talmud cited above. Uniquely Samaritan script features (as distinct from Phoenician/Palaeo-Hebrew) are discernable by the third century CE. Modern Samaritans continue to make use of this script, and a weekly newspaper Å Ä (A.B.) is published in Israel in Samaritan script (along with short articles in Hebrew and Arabic). 2. Corpus. There are some hundreds of Samaritan manuscripts; one of the largest collections is in the John Rylands University Library at the University of Manchester, including 377 items on parchment and paper. Samaritan MSS 1-27 were acquired in 1901 with the Crawford collection and include what is apparently the earliest dated manuscript (1211 CE) of the whole Samaritan Pentateuch to be found outside Nablus, as well as six other Pentateuchs in whole or in part (two bilingual), three noteworthy theological codices, and interesting liturgical and astronomical texts. Samaritan MSS 28-375 are from the collection of Dr Moses Gaster, acquired by the Library in 1954. Among them are manuscripts of the Pentateuch (including bilingual and trilingual texts), commentaries and treatises, and liturgical, historical, chronological and astronomical codices. There are detailed census lists of the Samaritans and lists of manuscripts in their possession. The Library also holds the substantial, but uncatalogued, correspondence of Dr Gaster with the Samaritan community in Nablus, in Hebrew but written in the Samaritan script. Some other important Samaritan manuscripts are found at the Chester Beatty Library in Dublin (dating to 1211 CE) and at the New York Public Library (dating to 1232 CE). 3. Structure. Samaritan is a right-to-left script. It does not ligate its letters as many right-to-left scripts do, and it does not have explicit final consonants as Hebrew does. 4. Vowels and other marks of pronunciation. Vowel signs are used optionally in Samaritan, as points are used optionally in Hebrew. In modern times, overlong vowels (marked here with circumflex and colon) and long vowels (marked here with circumflex) are distinguished from short vowels by the size of the diacritic. The default vowel sign to be used in transcribing text which does not make the distinction is the smallest one. õ@ ê LONG E ú@ e E @ û LONG U @ u U ù@ åˆ: OVERLONG AA û@ åˆ LONG AA ü@ å AA @ î LONG I @ i I @ â: OVERLONG A @ â LONG A @ a A @ o O @ ă SHORT A 4.1. The general behaviour of vowel signs. These vowel signs are combining characters, each resting to the left of its base consonant, effectively centred between its base consonant and the following one (if any). Examples using the letters YUT, QUF, DALAT, and IY, reading from right to left: yêqed, yåˆ:qåˆdåh, yâ:qâdah, yăqăd, yûqud, yîqid, yoqod: É í â É í â É í â É í â Ñ É í â ÑüÉûíùâ Éúíõâ 4.2. The behaviour of consonant modifiers. The four marks SUKUN, DAGESH, OCCLUSION, and NEQUDAA are centred over the base letter. Examples are, reading from right to left: yâqdah, yêqqed, ḥåˆbbåh, yûq ud: É í â ÑüòôÖûá Éôúíõâ Ñ É í â 2

These marks modify the consonant and precede the vowel signs where used. The SUKUN indicates that no vowel follows the consonant. The DAGESH indicates consonant gemination; NEQUDAA is an editorial mark which indicates that there is a variant reading of the word. The mark for OCCLUSION strengthens the consonant, as here where Ö w becomes òö b. Note that in the example ÑüòôÖûá ḥåˆbbåh, DAGESH stacks atop OCCLUSION òôö bb, reflecting the preferred encoding order since consonant quality precedes consonant length. The mark for OCCLUSION also has a secondary use, for instance, to mark personal names to distinguish them from homographs. So Öòîè îšab Esau contrasts with Öîè âšu they made. (Obviously with full pointing these words can also be distinguished by their vowels.) The character properties should, if possible, support the priority of these marks over the vowel signs. 4.3. The behaviour of IN and IN-ALAF. The two marks IN and IN-ALAF are used to indicate the presence of [ ] (Samaritan in, Hebrew ayin). These are also encoded immediately following their base letter and before any vowel signs. They are drawn to the right side of their base letter. Examples, reading from right to left: ḥyk, ḥ ayåh: Ñüûâ ñá äâá óä 4.4.The behaviour of EPENTHETIC YUT. The EPENTHETIC YUT, transcribed ỹ here, represents a kind of glide-vowel which interacts with another vowel sign. It was originally used only with the consonants Ä ALAF, Ñ IY, á IT, and è IN (Hebrew alef, he, ḥet, ayin); those letters used to serve to separate syllables, but lost their sound. The behaviour of the combining epenthetic YUT is the same as that of DAGESH and the other consonant modifiers mentioned above: it is centred above the consonant, with the following vowel sign centred more or less between it and the following letter. Examples, reading from right to left, bâ ỹåˆr, mi ỹåˆḥûriy, mihỹåˆḥelåk, miḥỹowṭ, mi ỹăl: ã öè å àö öá å äüãúáû öñ å â ì áûöä å ìûöä Å To make it clearer what is written here, hyphens might be inserted in the transliterations to show the pattern. A consonant (, h, ḥ, ) may be followed by one or two combining marks, as seen here (examples with some of the other combining marks shone above are given for comparison): bâ ỹåˆ r yê qe d yâ q da h mi ỹåˆ ḥû ri y yåˆ: qåˆ då h yê q e d (yêqqed) mi hỹåˆ ḥe lå k yâ: qâ da h ḥåˆ w å h (ḥåˆbbåh) mi ḥỹo w ṭ yă qă d yû q u d mi- ỹă-l yû qu d ḥ y k yî qi d ḥ a yå h yo qo d When epenthetic YUT is not fixed to one of the four consonants listed above, a new behaviour was innovated not recently in which the mark for the epenthetic YUT is treated as a spacing character of its own, capable of bearing its own diacritical mark. We transliterate the MODIFIER LETTER EPENTHETIC YUT as Ỹ below. Examples, reading from right to left: wutiỹâzal, miỹăsfåˆriy: â ìüê é ö å mi Ỹă s fåˆ ri y â- ì-üê- é- ö- å ã Üüö ïßö wu ti Ỹâ za l ã- Ü-üö- ï-ßö 3

At some point in the discussions leading to this proposal, the possibility of attaching the combining MARK EPENTHETIC YUT to a NBSP, but this would in effect break single words into two, separating prefixes and marks of conjugation from the root of the word: mi Øỹă s fåˆ ri y wu ti Øỹâ za l We believe that the simpler model C((M)V) where a consonant may be optionally followed by a combining mark (consonant sign or vowel sign) and optionally by a vowel sign is preferable to a solution which makes use of NBSP. Here are two examples where a single consonant is followed by two vowel signs, reading from left to right, eumer I will say, hå-inšem the women : åúî ç Ñ ìúå Ä ãúâî â håi n še m eu me r yi-šae-l 4.5. The behaviour of vowels ă and i in word-initial position. Two vowels are known to occur in initial position, before the base character. These are encoded as spacing modifier letters because combining characters cannot occur in initial position in a word. Users concerned with spoofing possibilities should note the similarity between MODIFIER LETTER SHORT A and @ VOWEL SIGN SHORT A and between ß MODIFIER LETTER I and @ VOWEL SIGN I. (It is extremely improbable that the user community, which is very small, will require Samaritan script in IDN or similar applications.) Examples, reading from right to left: ălfåniy before and â îçß inšiy wives of, using Ă and I in the transcription for the modifier letters: â îçß â çüêã I n ši y Ă l få ni y The MODIFIER LETTER SHORT A also has an additional function, when used following a letter used numerically, to indicate the thousands, so Ç = 3000. This is similar to the use of HEBREW PUNCTUATION GERESH for the same thing (fi = 3000). Hebrew GERESH serves a number of functions. It modifies the sound of a letter (ƒ ÿ fi ǧirafa giraffe ; fi tšips chips ); it marks abbreviations (fi Œ, short for Í Œ mispar number ); and in transliterations of Samaritan GERESH is used for the syllable initial ă, as in fi for ÉúÇ ç ănged. The MODIFIER LETTER SHORT A similarly has multiple functions (though the ABBREVIATION MARK is used with abbreviations in Samaritan). 4.5.1 Alternatives previously proposed for this behaviour. It has been suggested that the distinction between MODIFIER LETTER SHORT A and COMBINING SHORT A might cause difficulty for users, for instance in searching operations. This suggestion does not seem convincing to us. The word â îçß inšiy here could be represented, structurally, in Devanagari and Latin, completely isomorphically with the encoding model proposed here (apart from the irrelevant virama); the Latin example uses U+2071 SUPERSCRIPT LATIN SMALL LETTER I and U+0365 COMBINING LATIN SMALL LETTER I: ß + ç + î + @ + â = â îçß i + n + š + @ i + y = inš i y + ˇ + + @ + ˇ = ˇ The concurrent use of combining marks and spacing marks that look very similar is also not unique to Samaritan. In the orthography of Oowekyala, a North Wakashan language spoken in British Columbia, both spacing U+02BC MODIFIER LETTER APOSTROPHE and non-spacing U+0313 COMBINING COMMA ABOVE are used together to indicate glottalization. Among the consonants, plain resonants m n l y w contrast with glottalized resonants m n l y w. Among the vowels, plain vowels əm ən əl i u contrast with two sets of glottalized vowels: əmm ənn əll iy uw are used when any other vowel follows, and əm ən əl i u are used 4

word-finally or when an obstruent follows. Compare ǧəm s to lie on the ground with ǧəmm ìs to lie on the beach. John Hudson suggested that in order to avoid duplicating MODIFIER LETTER SHORT A with VOWEL SIGN SHORT A or MODIFIER LETTER I with VOWEL SIGN I, the generally-accepted UCS combining-character model might be abandoned for Samaritan, and all of its many combining diacritical marks might be represented by effectively spacing modifier letters that could be rendered correctly by a smart font. While such an encoding model could be made to work, we do not believe it is in the interests of the Samaritan user community itself or of the UCS user community in general to do this, and indeed, we do not believe that MODIFIER LETTER SHORT A with VOWEL SIGN SHORT A are duplicates any more than U+02BC MODIFIER LETTER APOSTROPHE and U+0313 COMBINING COMMA ABOVE are. All of the rest of the Semitic scripts follow the standard UCS encoding model. We can imagine no advantage for Samaritan to differ from this model. While word initial i- and a- are not rare in Samaritan, neither are they a dime a dozen. And this is certainly no reason to abandon a well-understood encoding model for a novel one. Anyone implementing Samaritan will be familiar with Hebrew if not also Arabic. Since spacing MODIFIER LETTER SHORT A is already required in Samaritan as a kind of GERESH to indicate the numeric use of Samaritan letters, the only anomaly to this encoding model is the glyph similarity of MODIFIER LETTER I with VOWEL SIGN I, and that already contrasts with the slightly larger VOWEL SIGN LONG I so what s the benefit in avoiding the current UCS model for encoding of western Semitic scripts? The (I)C((M)V) model for Samaritan is elegant, enables the representation of Samaritan data, and in our view is the optimum encoding model for Samaritan. 5. Punctuation. A large number of punctuation characters is used in Samaritan. These form a coherent and well-defined set, often with a diamond-shape to the dot (in most of the better-designed fonts such as that of the Imprimerie Nationale and the font used in the weekly newspaper Å Ä A.B.), and we propose that all of them be encoded as script-specific punctuation. The set as proposed follows the functional description found in Murtonen 1964. The NEQUDAA and ± AFSAAQ interruption are similar to the Hebrew SOF PASUQ and were used originally to separate sentences, but later to mark lesser breaks within a sentence. The AFSAAQ and the NEQUDAA are the oldest Samaritan punctuation marks. They are sometimes combined together ± with AFSAAQ preceding NEQUDAA, or vice-versa, ± with NEQUDAA preceding AFSAAQ, or both ± as NEQUDAA AFSAAQ NEQUDAA. Both of these characters should have the Sentence Terminal property. (Both Murtonen and the back matter of the Samaritan Pentateuch describe AFSAAQ explicitly as áfsaq and íüé ê ăfsåq. In the Markeh Shameri font AFSAAQ is named pause and NEQUDAA is named semicolon.) The ANGED restraint indicates a break somewhat less strong than an AFSAAQ. (Both Murtonen and the back matter of the Samaritan Pentateuch describe ANGED explicitly as ánged and ÉúÇ ç ănged.) The BAU request, prayer shows that the preceding is a humble petition, above all prayers to God. (Both Murtonen and the back matter of the Samaritan Pentateuch describe BAU explicitly as bâ u and Ö è Å ba uw.) The ATMAAU surprise shows that the preceding is unexpected. (Both Murtonen and the back matter of the Samaritan Pentateuch describe ATMAAU explicitly as atmâ u and Ö á å ï Ä atmâhuw.) The μ SHIYYAALAA question shows that the preceding is a question. (Both Murtonen and the back matter of the Samaritan Pentateuch describe SHIYYAALAA explicitly as šîla and ÑüãûÄöâ î šiỹ åˆlåh. In the Markeh Shameri font SHIYYAALAA is named question.) The ABBREVIATION MARK follows an abbreviation. The ZIQAA shout, cry marks expressions calling attention of human beings. (Both Murtonen and the back matter of the Samaritan Pentateuch describe ZIQAA explicitly as zîqa and Ñüíâè Ü zi yqåh.) 5

The π QITSA is similar to the ANNAAU (see below) but is used more frequently. The QITSA marks the end of a section, and is may be followed by a blank line to further make the point. It is analogous to the open and closed sections in the Masoretic Pentatuech. It has many glyph variants. One important variant differs significantly from any of the others; this is the MELODIC QITSA which is used to indicate the end of a sentence which one should read melodically. Together with ± AFSAAQ as ± it is used to mark the middle part of the Torah (at Leviticus 7:17). (Murtonen describes QITSA explicitly as qíṣṣa. In the Markeh Shameri font QITSA is named final pause. The Samaritan spelling is Ñüôë í qiṣṣåh) The ZAEF outburst marks expressions of vehemence and anger. (Both Murtonen and the back matter of the Samaritan Pentateuch describe ZAEF explicitly as zæˆf and êèü z f.) The ª TURU teaching marks didactic expressions. (Both Murtonen and the back matter of the Samaritan Pentateuch describe TURU explicitly as tûru and Ö ìö ï tûwruw.) The º ARKAANU submissiveness marks expressions of meekness and submission. (Both Murtonen and the back matter of the Samaritan Pentateuch describe ARKAANU explicitly as arkânu and Ö çûä ì Ä arkåˆnuw.) The Ω SOF MASHFAAT is equivalent to the full stop. (In the Markeh Shameri font SOF MASHFAAT is named full stop.) The æ ANNAAU rest indicates that a longer time has passed between actions narrated in the sentences which it separates; it is stronger than the AFSAAQ. (Both Murtonen and the back matter of the Samaritan Pentateuch describe ANNAAU explicitly as anâ u and Ö áüôç Ä ănnåhuw. In the Markeh Shameri font ANNAAU is named in error gutteral yut but it stands next to yut dagesha which is the EPENTHETIC YUT.) Samaritan distinguishes small dot from the larger NEQUDAA which is final punctuation like the AFSAAQ. Fossey s example in Figure 5 shows this distinction. The generic U+2E31 WORD SEPARATOR MIDDLE DOT can be used to represent this. As noted above, the set as proposed follows the functional description found in Murtonen 1964. Reviewers will note that the punctuation as described in secondary sources (Faulmann 1990 (1880), Reichsdruckerei 1924, von Ostermann 1954) some other configurations are also found. These may be conventional or ad-hoc on the part of the writer. The following is not an exhaustive list. The order is right-to-left. ± ± AFSAAQ + NEQUDAA ± NEQUDAA + ± AFSAAQ + NEQUDAA π π QITSA + NEQUDAA ± NEQUDAA + ± AFSAAQ ± MELODIC QITSA + ± AFSAAQ ± ± ZIQAA + ± AFSAAQ + NEQUDAA + ± AFSAAQ ± ZIQAA + ± AFSAAQ π π QITSA + ATMAAU πμ π QITSA + μ SHIYYAALAA Ωμ Ω SOF MASHFAAT + μ SHIYYAALAA Ω NEQUDAA + Ω SOF MASHFAAT There are other configurations in the MSS which cannot necessarily be composed based on the functional set proposed here. The angle used in BAU, ATMAAU, and μ SHIYYAALAA for instance has not been encoded uniquely since these elements does not necessarily make sense for Samaritan. The elements alone do not have names or functions and the functions are given as named entities by Murtonen. 6

6. Character names. While most of the text samples give Hebrew versions of the names of Samaritan characters in the charts, the Samaritan names as transliterated in Konô et al. 2001 (fig. 9) are preferred here. 7. Reference glyphs. The older font charts shown in a number of the Figures below present a normalized 19th-century font style. Modern Samaritan usage prefers fonts which look more like the actual manuscripts. The font used in the chart here was based on a modern font with a certain amount of rectification to enhance a uniform feel. 8. Unicode character properties. 0800;SAMARITAN LETTER ALAF;Lo;0;R;;;;;N;;;;; 0801;SAMARITAN LETTER BIT;Lo;0;R;;;;;N;;;;; 0802;SAMARITAN LETTER GAMAN;Lo;0;R;;;;;N;;;;; 0803;SAMARITAN LETTER DALAT;Lo;0;R;;;;;N;;;;; 0804;SAMARITAN LETTER IY;Lo;0;R;;;;;N;;;;; 0805;SAMARITAN LETTER BAA;Lo;0;R;;;;;N;;;;; 0806;SAMARITAN LETTER ZEN;Lo;0;R;;;;;N;;;;; 0807;SAMARITAN LETTER IT;Lo;0;R;;;;;N;;;;; 0808;SAMARITAN LETTER TIT;Lo;0;R;;;;;N;;;;; 0809;SAMARITAN LETTER YUT;Lo;0;R;;;;;N;;;;; 080A;SAMARITAN LETTER KAAF;Lo;0;R;;;;;N;;;;; 080B;SAMARITAN LETTER LABAT;Lo;0;R;;;;;N;;;;; 080C;SAMARITAN LETTER MIM;Lo;0;R;;;;;N;;;;; 080D;SAMARITAN LETTER NUN;Lo;0;R;;;;;N;;;;; 080E;SAMARITAN LETTER SINGAAT;Lo;0;R;;;;;N;;;;; 080F;SAMARITAN LETTER IN;Lo;0;R;;;;;N;;;;; 0810;SAMARITAN LETTER FI;Lo;0;R;;;;;N;;;;; 0811;SAMARITAN LETTER TSAADIY;Lo;0;R;;;;;N;;;;; 0812;SAMARITAN LETTER QUF;Lo;0;R;;;;;N;;;;; 0813;SAMARITAN LETTER RISH;Lo;0;R;;;;;N;;;;; 0814;SAMARITAN LETTER SHAN;Lo;0;R;;;;;N;;;;; 0815;SAMARITAN LETTER TAAF;Lo;0;R;;;;;N;;;;; 0816;SAMARITAN MARK IN;Mn;230;NSM;;;;;N;;;;; 0817;SAMARITAN MARK IN-ALAF;Mn;230;NSM;;;;;N;;;;; 0818;SAMARITAN MARK OCCLUSION;Mn;230;NSM;;;;;N;;;;; 0819;SAMARITAN MARK DAGESH;Mn;230;NSM;;;;;N;;;;; 081A;SAMARITAN MODIFIER LETTER EPENTHETIC YUT;Lo;0;R;;;;;N;;;;; 081B;SAMARITAN MARK EPENTHETIC YUT;Mn;230;NSM;;;;;N;;;;; 081C;SAMARITAN VOWEL SIGN LONG E;Mn;23;NSM;;;;;N;;;;; 081D;SAMARITAN VOWEL SIGN E;Mn;23;NSM;;;;;N;;;;; 081E;SAMARITAN VOWEL SIGN OVERLONG AA;Mn;23;NSM;;;;;N;;;;; 081F;SAMARITAN VOWEL SIGN LONG AA;Mn;23;NSM;;;;;N;;;;; 0820;SAMARITAN VOWEL SIGN AA;Mn;23;NSM;;;;;N;;;;; 0821;SAMARITAN VOWEL SIGN OVERLONG A;Mn;23;NSM;;;;;N;;;;; 0822;SAMARITAN VOWEL SIGN LONG A;Mn;23;NSM;;;;;N;;;;; 0823;SAMARITAN VOWEL SIGN A;Mn;23;NSM;;;;;N;;;;; 0824;SAMARITAN MODIFIER LETTER SHORT A;Lo;0;R;;;;;N;;;;; 0825;SAMARITAN VOWEL SIGN SHORT A;Mn;23;NSM;;;;;N;;;;; 0826;SAMARITAN VOWEL SIGN U;Mn;23;NSM;;;;;N;;;;; 0827;SAMARITAN VOWEL SIGN LONG U;Mn;23;NSM;;;;;N;;;;; 0828;SAMARITAN MODIFIER LETTER I;Lo;0;R;;;;;N;;;;; 0829;SAMARITAN VOWEL SIGN I;Mn;23;NSM;;;;;N;;;;; 082A;SAMARITAN VOWEL SIGN I;Mn;23;NSM;;;;;N;;;;; 082B;SAMARITAN VOWEL SIGN O;Mn;23;NSM;;;;;N;;;;; 082C;SAMARITAN VOWEL SIGN SUKUN;Mn;23;NSM;;;;;N;;;;; 082D;SAMARITAN MARK NEQUDAA;Mn;230;NSM;;;;;N;;;;; 0830;SAMARITAN PUNCTUATION NEQUDAA;Po;0;R;;;;;N;;;;; 0831;SAMARITAN PUNCTUATION AFSAAQ;Po;0;R;;;;;N;;;;; 0832;SAMARITAN PUNCTUATION ANGED;Po;0;R;;;;;N;;;;; 0833;SAMARITAN PUNCTUATION BAU;Po;0;R;;;;;N;;;;; 0834;SAMARITAN PUNCTUATION ATMAAU;Po;0;R;;;;;N;;;;; 0835;SAMARITAN PUNCTUATION SHIYYAALAA;Po;0;R;;;;;N;;;;; 0836;SAMARITAN ABBREVIATION MARK;Po;0;R;;;;;N;;;;; 0837;SAMARITAN PUNCTUATION MELODIC QITSA;Po;0;R;;;;;N;;;;; 0838;SAMARITAN PUNCTUATION ZIQAA;Po;0;R;;;;;N;;;;; 0839;SAMARITAN PUNCTUATION QITSA;Po;0;R;;;;;N;;;;; 083A;SAMARITAN PUNCTUATION ZAEF;Po;0;R;;;;;N;;;;; 083B;SAMARITAN PUNCTUATION TURU;Po;0;R;;;;;N;;;;; 083C;SAMARITAN PUNCTUATION ARKAANU;Po;0;R;;;;;N;;;;; 083D;SAMARITAN PUNCTUATION SOF MASHFAAT;Po;0;R;;;;;N;;;;; 083E;SAMARITAN PUNCTUATION ANNAAU;Po;0;R;;;;;N;;;;; 7

8. Bibliography Ben-Hayyam, Ze ev. 2000. A grammar of Samaritan Hebrew, based on the Recitation of the Law in comparison with the Tiberian and other Jewish traditions. Jersualem: Hebrew University Magnes Press. ISBN 1-57506-047-7 Faulmann, Carl. 1990 (1880). Das Buch der Schrift. Frankfurt am Main: Eichborn. ISBN 3-8218-1720-8 Fossey, Charles. 1948. Notices sur les caractères étrangers anciens et modernes rédigées par un groupe e savants. Nouvelle édition míse à jour à l occasion du 21 e Congrès des Orientalistes. Paris: Imprimerie Nationale de France. Haarmann, Harald. 1990. Die Universalgeschichte der Schrift. Frankfurt: Campus Verlag. ISBN 3-593- 34346-0 Healey, John F. 1990. The early alphabet. (Reading the past). London: British Museum. ISBN 0-7141-8073-4 Hilton, Susanne. 1982. Oowekeeno oral traditions as told by the late Chief Simon Walkus Sr. Ottawa: National Museums of Canada. Imprimerie Nationale. 1990. Les caractères de l Imprimerie Nationale. Paris: Imprimerie Nationale Éditions. ISBN 2-11-081085-8 Jensen, Hans. 1969. Die Schrift in Vergangenheit und Gegenwart. 3., neubearbeitete und erweiterte Auflage. Berlin: VEB Deutscher Verlag der Wissenschaften. Kôno Rokurô, Chino Eiichi, & Nishida Tatsuo. 2001. The Sanseido Encyclopaedia of Linguistics. Volume 7: Scripts and Writing Systems of the World [= Gengogaku dai ziten (bekkan) sekai mozi ziten]. Tokyo: Sanseido Press. ISBN 4-385-15177-6 Macuch, Rudolf. 1969. Grammatik des samaritanischan Hebräisch. Berlin: Walter de Gruyter. Murtonen, A. 1964. Materials for a non-masoretic Hebrew Grammer III: A grammar of the Samatiran dialect of Hebrew. (Studia Orientalia: 29) Helsinki: Societas Orientalis Fennica. Naveh, Joseph. 1997 (1987). Early history of the alphabet: an introduction to West Semitic epigraphy and palaeography. Jerusalem: The Magnes Press, The Hebrew University. ISBN 965-223-436-2 Reichsdruckerei. 1924. Alphabete und Schriftzeichen des Morgen- und Abendlandes, zum allgemeinen Gebrauch mit besonderer Berücksichtigung des Buchgewerbes. Unter Mitwerkung von Fachgelehrten zusammengestellt in der Reichsdruckerei. Berlin: Reichsdruckerei. von Ostermann, Georg F. 1952. Manual of foreign languages: for the use of librarians, bibliographers, research workers, editors, translators, and printers. Fourth edition, revised and enlarged. New York: Central Book Company. 9. Acknowledgements. This project was made possible in part by a grant from the U.S. National Endowment for the Humanities, which funded the Universal Scripts Project (part of the Script Encoding Initiative at UC Berkeley) in respect of the Samaritan encoding. Any views, findings, conclusions or recommendations expressed in this publication do not necessarily reflect those of the National Endowment of the Humanities. 8

Figures Figure 1. The Samaritan script, from Faulmann 1990 (1880), with Hebrew names, numeric value, and punctuation. Figure 2. The Samaritan script, from the Reichsdruckerei 1924. It is worth noting that in this book the Hebrew script is given on a different page under a different rubric, showing the Square Script, Rashi, and Weaver-German variants, as well as German and Polish handwritten styles. 9

Figure 3. The Samaritan alphabet, from von Ostermann 1954. This book is a handbook for librarians who need to identify and transliterate scripts. The glyphs, vowel samples, and punctuation all appear to have been taken from the Reichsdruckerei materials. 10

Figure 4. A Samaritan inscription from Naveh 1997. The punctuation marks ß AFSAAQ and NEQUDAA are shown. Figure 5. A Samaritan text from Fossey 1948. The small WORD SEPARATION POINT is shown along with the larger punctuation marks ± AFSAAQ and NEQUDAA. 11

Figure 6. Sample text from the Imprimerie Nationale 1990, showing three styles and two sizes of Samaritan text; ± AFSAAQ and NEQUDAA are also shown. 12

Figure 7. Text from Healey 1990, showing text from a Samaritan Bible (Genesis 21:4 14), in a manuscript dating from the 13th century CE held in the Chester Beatty Library in Dublin (MS 751 27v). Figure 8. Text from Konô 2001 taken from Ratson Tsedaqah s 1982 edition of Tōrāh Tmīmāh, showing Samaritan vowel signs. 13

Figure 9. Text from Konô 2001 showing various examples of Samaritan inscriptional and book text, phonetic transcription and names, and Square Hebrew equivalents. 14

Figure 10b. Discussion of Samaritan punctuation from Murtonen 1964. Murtonen does not have adequate fonts for the punctuation characters. 15

Figure 11. Discussion in Hebrew of Samaritan punctuation marks. Shown are, from right to left, ± AFSAAQ, ANGED, æ ANNAAU, [º] ARKAANU, BAU, μ SHIYYAALAA, ZIQAA, ZAEF, ª TURU, and ATMAAU. Figure 12. Samaritan manuscript 201 from Ashqelon, Israel, CE 1189. The text shown is Leviticus. 16

Figure 13. A Samaritan manuscript. Here the WORD SEPARATION POINT is used between words, and NEQUDAA is used at the beginnings of some lines in front of AFSAAQ ± and at the end of some lines after AFSAAQ ±. 17

Figure 14. Sample from the weekly Samaritan newspaper, Å Ä (A.B.). 18

Figure 15. A page from the Book of Genesis. 19

Proposal for the Universal Character Set Michael Everson & Mark Shoulson Row 08: SAMARITAN DRAFT 0 1 2 3 4 5 6 7 8 9 A B 080 081 Ä ê Å ë Ç í É ì Ñ î Ö ï Ü ñ@ á ó@ à ò@ â ô@ ä ö ã õ@ 082 @ @ @ @ @ @ ß@ @ @ @ 083 ± μ π ª hex 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F Name SAMARITAN LETTER ALAF SAMARITAN LETTER BIT SAMARITAN LETTER GAMAN SAMARITAN LETTER DALAT SAMARITAN LETTER IY SAMARITAN LETTER BAA SAMARITAN LETTER ZEN SAMARITAN LETTER IT SAMARITAN LETTER TIT SAMARITAN LETTER YUT SAMARITAN LETTER KAAF SAMARITAN LETTER LABAT SAMARITAN LETTER MIM SAMARITAN LETTER NUN SAMARITAN LETTER SINGAAT SAMARITAN LETTER IN SAMARITAN LETTER FI SAMARITAN LETTER TSAADIY SAMARITAN LETTER QUF SAMARITAN LETTER RISH SAMARITAN LETTER SHAN SAMARITAN LETTER TAAF SAMARITAN MARK IN SAMARITAN MARK IN-ALAF SAMARITAN MARK OCCLUSION SAMARITAN MARK DAGESH SAMARITAN MODIFIER LETTER EPENTHETIC YUT SAMARITAN MARK EPENTHETIC YUT SAMARITAN VOWEL SIGN LONG E (fatha al-nida) SAMARITAN VOWEL SIGN E SAMARITAN VOWEL SIGN OVERLONG AA (fatha al-ima) SAMARITAN VOWEL SIGN LONG AA SAMARITAN VOWEL SIGN AA SAMARITAN VOWEL SIGN OVERLONG A (fatha al-iha) SAMARITAN VOWEL SIGN LONG A SAMARITAN VOWEL SIGN A SAMARITAN MODIFIER LETTER SHORT A SAMARITAN VOWEL SIGN SHORT A (fatha) SAMARITAN VOWEL SIGN LONG U (damma) SAMARITAN VOWEL SIGN U SAMARITAN MODIFIER LETTER I SAMARITAN VOWEL SIGN LONG I (kasra) SAMARITAN VOWEL SIGN I SAMARITAN VOWEL SIGN O SAMARITAN VOWEL SIGN SUKUN SAMARITAN MARK NEQUDAA (This position shall not be used) (This position shall not be used) SAMARITAN PUNCTUATION NEQUDAA SAMARITAN PUNCTUATION AFSAAQ SAMARITAN PUNCTUATION ANGED SAMARITAN PUNCTUATION BAU SAMARITAN PUNCTUATION ATMAAU SAMARITAN PUNCTUATION SHIYYAALAA SAMARITAN ABBREVIATION MARK SAMARITAN PUNCTUATION MELODIC QITSA SAMARITAN PUNCTUATION ZIQAA SAMARITAN PUNCTUATION QITSA SAMARITAN PUNCTUATION ZAEF SAMARITAN PUNCTUATION TURU SAMARITAN PUNCTUATION ARKAANU SAMARITAN PUNCTUATION SOF MASHFAAT SAMARITAN PUNCTUATION ANNAAU (This position shall not be used) C å ú@ @ º D ç ù@ @ Ω E é û@ Æ æ F è ü@ Ø ø 20

A. Administrative 1. Title Proposal to add the Samaritan alphabet to the BMP of the UCS 2. Requester s name UC Berkeley Script Encoding Initiative (Universal Scripts Project); Authors: Michael Everson and Mark Shoulson 3. Requester type (Member body/liaison/individual contribution) Liaison contribution. 4. Submission date 2008-01-25 5. Requester s reference (if applicable) 6. Choose one of the following: 6a. This is a complete proposal 6b. More information will be provided later No. B. Technical General 1. Choose one of the following: 1a. This proposal is for a new script (set of characters) 1b. Proposed name of script Samaritan. 1c. The proposal is for addition of character(s) to an existing block 1d. Name of the existing block 2. Number of characters in proposal 61. 3. Proposed category (A-Contemporary; B.1-Specialized (small collection); B.2-Specialized (large collection); C-Major extinct; D-Attested extinct; E-Minor extinct; F-Archaic Hieroglyphic or Ideographic; G-Obscure or questionable usage symbols) Category A. 4a. Is a repertoire including character names provided? 4b. If YES, are the names in accordance with the character naming guidelines in Annex L of P&P document? 4c. Are the character shapes attached in a legible form suitable for review? 5a. Who will provide the appropriate computerized font (ordered preference: True Type, or PostScript format) for publishing the standard? Michael Everson. 5b. If available now, identify source(s) for the font (include address, e-mail, ftp-site, etc.) and indicate the tools used: Michael Everson, Fontographer. 6a. Are references (to other character sets, dictionaries, descriptive texts etc.) provided? 6b. Are published examples of use (such as samples from newspapers, magazines, or other sources) of proposed characters attached? 7. Does the proposal address other aspects of character data processing (if applicable) such as input, presentation, sorting, searching, indexing, transliteration etc. (if yes please enclose information)? 8. Submitters are invited to provide any additional information about Properties of the proposed Character(s) or Script that will assist in correct understanding of and correct linguistic processing of the proposed character(s) or script. Examples of such properties are: Casing information, Numeric information, Currency information, Display behaviour information such as line breaks, widths etc., Combining behaviour, Spacing behaviour, Directional behaviour, Default Collation behaviour, relevance in Mark Up contexts, Compatibility equivalence and other Unicode normalization related information. See the Unicode standard at http://www.unicode.org for such information on other scripts. Also see Unicode Character Database http://www.unicode.org/public/unidata/unicodecharacterdatabase.html and associated Unicode Technical Reports for information needed for consideration by the Unicode Technical Committee for inclusion in the Unicode Standard. See above. C. Technical Justification 1. Has this proposal for addition of character(s) been submitted before? If YES, explain. No. 2a. Has contact been made to members of the user community (for example: National Body, user groups of the script or characters, other experts, etc.)? 2b. If YES, with whom? Alan Crown, Osher Sassoni, Benny Tsedaka 2c. If YES, available relevant documents 3. Information on the user community for the proposed characters (for example: size, demographics, information technology use, or publishing use) is included? Ecclesiastical and cultural communities. 21

4a. The context of use for the proposed characters (type of use; common or rare) Characters are used to write the Samaritan language. 4b. Reference 5a. Are the proposed characters in current use by the user community? 5b. If YES, where? In Israel and the West Bank by Samaritans; also by scholars, ecclesiastical researchers, and librarians. 6a. After giving due considerations to the principles in the P&P document must the proposed characters be entirely in the BMP? 6b. If YES, is a rationale provided? 6c. If YES, reference Accordance with the Roadmap; RTL script with modern use. 7. Should the proposed characters be kept together in a contiguous range (rather than being scattered)? 8a. Can any of the proposed characters be considered a presentation form of an existing character or character sequence? No. 8b. If YES, is a rationale for its inclusion provided? 8c. If YES, reference 9a. Can any of the proposed characters be encoded using a composed character sequence of either existing characters or other proposed characters? No. 9b. If YES, is a rationale for its inclusion provided? 9c. If YES, reference 10a. Can any of the proposed character(s) be considered to be similar (in appearance or function) to an existing character? No. 10b. If YES, is a rationale for its inclusion provided? 10c. If YES, reference 11a. Does the proposal include use of combining characters and/or use of composite sequences (see clauses 4.12 and 4.14 in ISO/IEC 10646-1: 2000)? 11b. If YES, is a rationale for such use provided? No. 11c. If YES, reference 11d. Is a list of composite sequences and their corresponding glyph images (graphic symbols) provided? No. 11e. If YES, reference 12a. Does the proposal contain characters with any special properties such as control function or similar semantics? No. 12b. If YES, describe in detail (include attachment if necessary) 13a. Does the proposal contain any Ideographic compatibility character(s)? No. 13b. If YES, is the equivalent corresponding unified ideographic character(s) identified? 22